Supplementary: An Integrative Approach to Predicting the Functional Effects of Non-Coding and Coding Sequence Variation

ثبت نشده
چکیده

A. 46-Way Sequence Conservation: variants within highly conserved regions are likely to have more impact than those within high variability regions; therefore we used two measures of evolutionary conservation, namely PhastCons (Siepel et al., 2005) and PhyloP (Pollard et al., 2010) scores, obtained from the multiple sequence alignment (at the nucleotide level) of 46 vertebrate genomes to the human genome (Blanchette et al., 2004). In addition to these scores, we constructed ab initio hidden Markov models (HMMs) representing these alignments using the HMMER software package (version 3.1b1) and extracted the relative probabilities of each nucleotide at the corresponding position within the alignment. Following our previous work (Shihab et al., 2013b,a, 2014), we also included a measure of the magnitude of effect given the SNV (i.e. the log-odds ratio of observing both nucleotides). B. Histone Modifications (ChIP-Seq): we used ChIP-Seq peak calls for 14 histone modifications across 45 cell lines from ENCODE (The ENCODE Project Consortium, 2012). C. Transcription Factor Binding Sites (TFBS PeakSeq): we used PeakSeq (Rozowsky et al., 2009) peak calls for 119 transcription factors across 77 cell lines from ENCODE. D. Open Chromatin (DNase-Seq): we used DNase-Seq peak calls across 119 cell lines from ENCODE. E. 100-Way Sequence Conservation: we used similar features to those described in A. but now obtained from the multiple sequence alignment of 100 vertebrate genomes to the human genome. We considered both 100-way (E) and 46-way sequence conservation (A) to highlight any gain which could be made by including more species in the comparison. F. GC Content: we used a single measure for GC content calculated using a span of 5 nucleotide bases from the UCSC Genome Browser (Kent et al., 2002). G. Open Chromatin (FAIRE): we used Formaldehyde-Assisted Isolation of Regulatory Elements (FAIRE) peak calls across 119 cell lines from ENCODE. H. Transcription Factor Binding Sites (TFBS SPP): we used SPP peak calls Kharchenko et al. (2008) for 119 transcription factors across 77 cell lines from ENCODE. I. Genome Segmentation: we used 7 genome-segmentation states in 6 cell lines using a consensus merge of segmentations produced by the ChromHMM (Ernst and Kellis, 2010) and Segway software (Hoffman et al., 2012). J. Footprints: we used annotations describing DNA footprints across 41 cell types from ENCODE.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Genome analysis An integrative approach to predicting the functional effects of non-coding and coding sequence variation

Motivation: Technological advances have enabled the identification of an increasingly large spectrum of single nucleotide variants within the human genome, many of which may be associated with monogenic disease or complex traits. Here, we propose an integrative approach, named FATHMM-MKL, to predict the functional consequences of both coding and non-coding sequence variants. Our method utilizes...

متن کامل

An integrative approach to predicting the functional effects of non-coding and coding sequence variation

MOTIVATION Technological advances have enabled the identification of an increasingly large spectrum of single nucleotide variants within the human genome, many of which may be associated with monogenic disease or complex traits. Here, we propose an integrative approach, named FATHMM-MKL, to predict the functional consequences of both coding and non-coding sequence variants. Our method utilizes ...

متن کامل

Phylogenetic Analysis of Three Long Non-coding RNA Genes: AK082072, AK043754 and AK082467

Now, it is clear that protein is just one of the most functional products produced by the eukaryotic genome. Indeed, a major part of the human genome is transcribed to non-coding sequences than to the coding sequence of the protein. In this study, we selected three long non-coding RNAs namely AK082072, AK043754 and AK082467 which show brain expression and local region conservation among vertebr...

متن کامل

Tracking the Sequences of Patient-Therapist Dialogues by Coding Responses during Integrative Psychotherapy

Aim: The Assimilation of Problematic Experiences Scale (APES) for the coding of client responses and the Process Focused Conversation Analysis (PFCA) for coding therapist responses were applied to transcripts of a successful case of integrative psychotherapy of depression. Methods: the research method of the present research was case study. Dialogues (150) between a therapist and client in one ...

متن کامل

Long non-coding RNAs and their significance in human diseases

Protein-coding genes account for only a small fraction of the human genome and most of the genomic sequences are transcriptionally silent, but recent observations indicate significant functional elements, including non-coding protein transcripts in the human genome. Long non-coding RNAs (lncRNAs) have been defined as transcripts of >200 nucleotides without protein-coding capacity that perform t...

متن کامل

An evidence based care package to improve motor skills of infants living in foster care according to integrative review approach

Background: Infancy is the most important extra uterine period of brain development. And it requires environmental stimulation for expression of the developmental capabilities. Meanwhile, due to repeated environmental disparities foster care children are at risk for developmental delay. Aim: designing evidence based care package to improve motor skills of orphan living infants according to inte...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017